Exploring rich intermediate representations for reconstructing 3D shapes from 2D images
Abstract
Recovering 3D voxelized shapes with fine details from single-view 2D images is an extremely challenging and ill-conditioned problem. Most of the existing methods learn the 3D reconstruction process by encoding the 3D shapes and the 2D images into the same low-dimensional latent vector, which lacks the capacity to capture detailed features in the surface of the 3D object shapes.
To address this issue, we propose to explore rich intermediate representation for 3D shape reconstruction by using a newly designed network architecture.
Methodology
We first use a two-stream network to infer the depth map and the topology-specific mean shape from the given 2D image, which forms the intermediate representation prediction branch. The intermediate representations capture the global spatial structure and the visible surface geometric structure, which are important for reconstructing high-quality 3D shapes.
Based on the obtained intermediate representation, a novel shape transformation network is then proposed to reconstruct the fine details of the whole 3D object shapes.
The key advantages of our approach:
1. Intermediate Representations: Capture global spatial structure and visible surface geometry, providing crucial information for high-quality 3D reconstruction.
2. Shape Transformation Network: Reconstructs fine details of the complete 3D object shapes based on intermediate representations.
Experimental Results
The experimental results on the challenging ShapeNet and Pix3D datasets show that our approach outperforms the existing state-of-the-art methods.
The results demonstrate that exploring rich intermediate representations significantly improves the quality of 3D shape reconstruction from single-view 2D images, especially in capturing fine surface details.